The NICT translation system for IWSLT 2012
نویسندگان
چکیده
This paper describes NICT’s participation in the IWSLT 2011 evaluation campaign for the TED speech translation ChineseEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways. Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics. Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.
منابع مشابه
Minimum Bayes-Risk decoding extended with similar examples: NAIST-NICT at IWSLT 2012
This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayesrisk decoding which reduces a expected loss.
متن کاملTwo methods for stabilizing MERT: NICT at IWSLT 2009
This paper describes the NICT SMT system used in the International Workshop on Spoken Language Translation (IWSLT) 2009 evaluation campaign. We participated in the Challenge Task. Our system was based on a fairly common phrase-based machine translation system. We used two methods for stabilizing MERT.
متن کاملMinimum Bayes-Risk Decoding Extended with Similar Examples: NAIST-NICT at IWSLT
This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayesrisk decoding which reduces a expected loss.
متن کاملThe niCT-ATR statistical machine translation system for the IWSLT 2006 evaluation
This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2006 evaluation compaign. We participated in all four language pair translation tasks (CE, JE, AE and IE) and all two tracks (OPEN and CSTAR). We used a phrase-based SMT in the OPEN track and a hybrid multiple translation engine in the CSTAR track. We also equipped our system with some of new prepr...
متن کاملThe NICT/ATR speech translation system for IWSLT 2008
This paper describes the National Institute of Information and Communications Technology/Advanced Telecommunications Research Institute International (NICT/ATR) statistical machine translation (SMT) system used for the IWSLT 2008 evaluation campaign. We participated in the Chinese– English (Challenge Task), English–Chinese (Challenge Task), Chinese–English (BTEC Task), Chinese–Spanish (BTEC Tas...
متن کاملThe NICT/ATR speech translation system for IWSLT 2007
This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2007 evaluation campaign. We participated in three of the four language pair translation tasks (CE, JE, and IE). We used a phrase-based SMT system using log-linear feature models for all tracks. This year we decoded from the ASR n-best lists in the JE track and found a gain in performance. We also ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011